L1 and L2 regularization for multiclass hinge loss models
نویسندگان
چکیده
This paper investigates the relationship between the loss function, the type of regularization, and the resulting model sparsity of discriminatively-trained multiclass linear models. The effects on sparsity of optimizing log loss are straightforward: L2 regularization produces very dense models while L1 regularization produces much sparser models. However, optimizing hinge loss yields more nuanced behavior. We give experimental evidence and theoretical arguments that, for a class of problems that arises frequently in natural-language processing, both L1and L2-regularized hinge loss lead to sparser models than L2-regularized log loss, but less sparse models than L1-regularized log loss. Furthermore, we give evidence and arguments that for models with only indicator features, there is a critical threshold on the weight of the regularizer below which L1and L2-regularized hinge loss tends to produce models of similar sparsity.
منابع مشابه
A Study on L2-Loss (Squared Hinge-Loss) Multiclass SVM
Crammer and Singer's method is one of the most popular multiclass support vector machines (SVMs). It considers L1 loss (hinge loss) in a complicated optimization problem. In SVM, squared hinge loss (L2 loss) is a common alternative to L1 loss, but surprisingly we have not seen any paper studying the details of Crammer and Singer's method using L2 loss. In this letter, we conduct a thorough inve...
متن کاملThe Lq Support Vector Machine
The standard Support Vector Machine (SVM) minimizes the hinge loss function subject to the L2 penalty or the roughness penalty. Recently, the L1 SVM was suggested for variable selection by producing sparse solutions [BM, ZHRT]. These learning methods are non-adaptive since their penalty forms are pre-determined before looking at data, and they often perform well only in a certain type of situat...
متن کاملEfficient variable selection in support vector machines via the alternating direction method of multipliers
The support vector machine (SVM) is a widely used tool for classification. Although commonly understood as a method of finding the maximum-margin hyperplane, it can also be formulated as a regularized function estimation problem, corresponding to a hinge loss function plus an l2-norm regulation term. The doubly regularized support vector machine (DrSVM) is a variant of the standard SVM, which i...
متن کاملStochastic functional descent for learning Support Vector Machines
We present a novel method for learning Support Vector Machines (SVMs) in the online setting. Our method is generally applicable in that it handles the online learning of the binary, multiclass, and structural SVMs in a unified view. The SVM learning problem consists of optimizing a convex objective function that is composed of two parts: the hinge loss and quadratic (L2) regularization. To date...
متن کاملA Proximal Approach for Sparse Multiclass SVM
Sparsity-inducing penalties are useful tools to design multiclass support vector machines (SVMs). In this paper, we propose a convex optimization approach for efficiently and exactly solving the multiclass SVM learning problem involving a sparse regularization and the multiclass hinge loss formulated by [1]. We provide two algorithms: the first one dealing with the hinge loss as a penalty term,...
متن کامل